[Storage cleaner] Add wandb path implementation #400

2015aroras · 2023-12-14T00:15:40Z

This PR adds a basic implementation for getting a wandb path for a given run directory. This may have issues in some common cases:

If the run used save_overwrite and so the top-level config.yaml does not match all checkpoints, then the resulting wandb path won't be correct for all checkpoints of the "run". Dealing with this programmatically seems more messy than is worth.
If run does not have a top-level config.yaml, then the script exits with error. Using a checkpoint's config.yaml could lead to the previous scenario.
If the run has been deleted from wandb, then the script exits with error. We can probably deal with this manually?
If the run has under-specified wandb information in the config file. One can probably deduce the right run from other information, but I don't think this is worth it presently.

I am hoping that most of the runs worth keeping do not face the above issues. I can improve how the storage cleaner deals with these scenarios if we believe they are more important than I had expected.

epwalsh

I don't think resolving the W&B run based on the W&B run name is going to work well in practice, at least for my runs where I typically use the same W&B name for each restart.

Alternatively:

Use the existing name/group filters to get a set of potential matches
Then filter those matches by comparing all fields in the config

This should work since we save the train config to the W&B run config.

scripts/storage_cleaner.py

Co-authored-by: Pete <[email protected]>

2015aroras · 2023-12-14T00:56:53Z

I don't think resolving the W&B run based on the W&B run name is going to work well in practice, at least for my runs where I typically use the same W&B name for each restart.

Alternatively:

Use the existing name/group filters to get a set of potential matches

Then filter those matches by comparing all fields in the config

This should work since we save the train config to the W&B run config.

This would help filter out some runs but I assume that you sometimes use the exact same config for several runs (I have in the past)? Are you concerned about that case too?

epwalsh · 2023-12-14T19:08:28Z

This would help filter out some runs but I assume that you sometimes use the exact same config for several runs (I have in the past)? Are you concerned about that case too?

I probably have some runs like this due to failures, but in general I try to remove failed runs immediately from W&B. When that happens with this script we could print out links to each of those runs so the user could check to see if some of duplicates can be removed from W&B.

2015aroras · 2023-12-14T23:04:44Z

I don't think resolving the W&B run based on the W&B run name is going to work well in practice, at least for my runs where I typically use the same W&B name for each restart.

Alternatively:

Use the existing name/group filters to get a set of potential matches

Then filter those matches by comparing all fields in the config

This should work since we save the train config to the W&B run config.

223eb6a Filtering by config took ~90 lines.

epwalsh · 2023-12-15T00:30:31Z

Hmm you might be able to simplify that a bit by trying to load the W&B config into a TrainConfig?

2015aroras · 2023-12-15T19:13:43Z

Hmm you might be able to simplify that a bit by trying to load the W&B config into a TrainConfig?

b4ac61e Somehow I didn't even think about this. It looks like it works fine.

2015aroras added 3 commits December 13, 2023 15:49

Add logic for getting wandb path from run dir

42fc50d

Add trailing / for dirs to avoid future problems

70645bd

Add some extra logging for wandb path logic

b67c49f

2015aroras requested a review from epwalsh December 14, 2023 00:15

2015aroras added 2 commits December 13, 2023 16:16

Run ruff

2d63b04

Merge branch 'main' into shanea/storage-cleaner-wandb

5c32f67

epwalsh requested changes Dec 14, 2023

View reviewed changes

scripts/storage_cleaner.py Outdated Show resolved Hide resolved

Change wandb path logging from log to debug

47f1af6

Co-authored-by: Pete <[email protected]>

Merge branch 'main' into shanea/storage-cleaner-wandb

053c4b1

2015aroras added 3 commits December 14, 2023 14:58

Add ability to get wandb run info from wandb directory

bacb7b1

Verify that the wandb run config matches the training config

223eb6a

Run ruff

4cc662a

2015aroras added 2 commits December 15, 2023 10:56

Use TrainConfig for comparing both wandb and run config

b4ac61e

Run ruff

b89d702

epwalsh approved these changes Dec 15, 2023

View reviewed changes

Merge branch 'main' into shanea/storage-cleaner-wandb

b7a3c66

2015aroras merged commit 685d11b into main Dec 15, 2023
10 checks passed

2015aroras deleted the shanea/storage-cleaner-wandb branch December 15, 2023 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Storage cleaner] Add wandb path implementation #400

[Storage cleaner] Add wandb path implementation #400

2015aroras commented Dec 14, 2023 •

edited

Loading

epwalsh left a comment

2015aroras commented Dec 14, 2023

epwalsh commented Dec 14, 2023

2015aroras commented Dec 14, 2023

epwalsh commented Dec 15, 2023

2015aroras commented Dec 15, 2023

[Storage cleaner] Add wandb path implementation #400

[Storage cleaner] Add wandb path implementation #400

Conversation

2015aroras commented Dec 14, 2023 • edited Loading

epwalsh left a comment

Choose a reason for hiding this comment

2015aroras commented Dec 14, 2023

epwalsh commented Dec 14, 2023

2015aroras commented Dec 14, 2023

epwalsh commented Dec 15, 2023

2015aroras commented Dec 15, 2023

2015aroras commented Dec 14, 2023 •

edited

Loading